Concepedia

Concept

data-intensive computing

Parents

Children

4.9K

Publications

343.9K

Citations

14.7K

Authors

2.3K

Institutions

Data-Intensive Computing (1999-2007)

1999 - 2007

The period is characterized by a shift toward processing and moving large-scale data in distributed and heterogeneous environments, driven by the need to minimize data movement, exploit data locality, and enable scalable analytics on commodity hardware. Techniques emphasize decoupled task placement, grid-aware scheduling, efficient data transport, and compact representations to manage the deluge of scientific and industrial data. Historical Significance: Foundations were laid for data-intensive workflows, reproducible computing, and distributed data infrastructures that would underpin later big-data and high-performance analytics, including early explorations of genome-scale data handling, wide-area data access, and probabilistic counting for large streams.

Data-grid resource management and scheduling patterns that couple computation with data locality, enabling decoupled task placement, explicit data-placement strategies, and grid schedulers to minimize data movement and balance load across distributed resources [1], [17], [16], [4], [18].

Compression and compact representations for high-dimensional data to tame storage and I/O bottlenecks, via condensed data cubes, prefix/suffix redundancy removal, and highly condensed cube structures across diverse scientific datasets [2], [3], [13], [14], [12].

Data transport, replication management, and remote access for large-scale datasets in distributed environments, emphasizing secure, efficient transfer, consistency across replicas, and scalable wide-area data access [4], [8], [16].

Multi-dimensional data processing and indexing architectures for scientific datasets, integrating storage, retrieval, and computation across distributed memories and disks to support fast, scalable analytics [6], [10], [15], [14], [11].

Virtual data, provenance, and derivation frameworks enabling reproducible workflows and on-demand data generation in grid-enabled science, documenting procedures and relationships among data products [9], [17].

MapReduce-to-DAG Data Analytics

2008 - 2014

Dataflow-Driven Distributed Analytics

2015 - 2016

Memory-Centric Data Analytics

2017 - 2023